Add subgroup wasm gating example by ksgr5566 · Pull Request #797 · mlc-ai/web-llm

ksgr5566 · 2026-03-28T01:53:46Z

Summary

Adds a examples/wasm-gating example showing how to route between baseline and subgroup WebGPU WASM libraries in WebLLM.

checks adapter.features.has("subgroups")
add subgroup-aware model_lib selection based on WebGPU adapter support
switches to -subgroups.wasm when subgroup support is available

Testing

verified the example routes from .wasm to -subgroups.wasm when subgroups is reported by the adapter

gemini-code-assist

Code Review

This pull request introduces a new example, wasm-gating, which demonstrates capability-based routing between baseline and subgroup WebGPU WASM builds. The addition includes a TypeScript implementation, HTML structure, and documentation. Feedback was provided to generalize the comments regarding logit_bias token IDs, as the current descriptions are specific to a different model version and could be misleading if the model or tokenizer is updated.

gemini-code-assist · 2026-03-28T01:55:35Z

examples/wasm-gating/src/wasm_gating.ts

+  const reply0 = await engine.chat.completions.create({
+    messages: [{ role: "user", content: "List three US states." }],
+    // below configurations are all optional
+    n: 3,
+    temperature: 1.5,
+    max_tokens: 256,
+    // 46510 and 7188 are "California", and 8421 and 51325 are "Texas" in Llama-3.1-8B-Instruct
+    // So we would have a higher chance of seeing the latter two, but never the first in the answer
+    logit_bias: {
+      "46510": -100,
+      "7188": -100,
+      "8421": 5,
+      "51325": 5,
+    },
+    logprobs: true,
+    top_logprobs: 2,
+  });


The comments explaining the specific token IDs for "California" and "Texas" are highly model-dependent (Llama-3.1-8B-Instruct). This makes the example less portable and the comments could quickly become outdated or misleading if the model or tokenizer changes. Consider making these comments more generic about the purpose of logit_bias rather than detailing specific token values, or moving such model-specific details to external documentation if necessary.

// Example of using logit_bias to influence token generation. // Specific token IDs and their corresponding words are model-dependent. logit_bias: { "46510": -100, "7188": -100, "8421": 5, "51325": 5, },

akaashrp · 2026-03-30T16:36:05Z

examples/wasm-gating/src/wasm_gating.ts

+  const modelRecord = webllm.prebuiltAppConfig.model_list.find(
+    (entry) => entry.model_id === selectedModel,
+  );
+  const appConfig =


We also want to enforce subgroupMinSize <= 32 <= subgroupMaxSize and maxComputeInvocationsPerWorkgroup = 1024 for the subgroup wasm path

subgroup example

a67e8e4

gemini-code-assist bot reviewed Mar 28, 2026

View reviewed changes

akaashrp requested changes Mar 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add subgroup wasm gating example#797

Add subgroup wasm gating example#797
ksgr5566 wants to merge 1 commit intomlc-ai:mainfrom
ksgr5566:webgpu-subgroups

ksgr5566 commented Mar 28, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 28, 2026

Uh oh!

akaashrp Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ksgr5566 commented Mar 28, 2026

Summary

Testing

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

akaashrp Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants